Accurate Schema Matching on Streams

نویسندگان

  • Szymon Jaroszewicz
  • Lenka Ivantysynova
  • Tobias Scheffer
چکیده

We address the problem of matching imperfectly documented schemas of data streams and large databases. Instance-level schema matching algorithms identify likely correspondences between attributes by quantifying the similarity of their corresponding values. However, exact calculation of these similarities requires processing of all database records—which is infeasible for data streams. We devise a fast matching algorithm that uses only a small sample of records, and is yet guaranteed to match the most similar attributes with high probability. The method can be applied to any given (combination of) similarity metrics that can be estimated from a sample with bounded error; we apply the algorithm to several metrics. We give a rigorous proof of the method’s correctness and report on experiments using large databases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved Semantic Schema Matching Approach

Schema matching is a critical step in many applications, such as data warehouse loading, Online Analytical Process (OLAP), Data mining, semantic web [2] and schema integration. This task is defined for finding the semantic correspondences between elements of two schemas. Recently, schema matching has found considerable interest in both research and practice. In this paper, we present a new impr...

متن کامل

Towards Efficient Schema-Enhanced Pattern Matching over RDF Data Streams

Data streams, often seen as sources of events, have appeared on the Web. Event processing on the Web needs however to cope with the typical openness and heterogeneity of the Web environment. Semantic Web technology, meant to facilitate data integration in an open environment, can help to address heterogeneities across multiple streams. In this paper we discuss an approach towards efficient patt...

متن کامل

Schema matching on streams with accuracy guarantees

We address the problem of matching imperfectly documented schemas of data streams and large databases. Instancelevel schema matching algorithms identify likely correspondences between attributes by quantifying the similarity of their corresponding values. However, exact calculation of these similarities requires processing of all database records – which is infeasible for data streams. We devis...

متن کامل

Approximate Common Structures in XML Schema Matching1

This paper describes a matching algorithm that can find accurate matches and scales to large XML Schemas with hundreds of nodes. We model XML Schemas as labeled, unordered and rooted trees, and turn the schema matching problem into a tree matching problem. We develop a tree matching algorithm based on the concept of Approximate Common Structures. Compared with the tree edit-distance algorithm a...

متن کامل

Toward the Scalable Integration of Internet

This dissertation in a broad sense focuses on understanding the fundamental aspects of building a large-scale information integration system that can answer complex queries over a large number of heterogeneous Internet data sources. Among many challenges in achieving this goal, we focus on two key issues: efficient query processing and schema matching. Most of the data the integration system pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006